Regression for Linguists
  • D. Palleschi
  1. Part I: Foundations
  2. 3  Continuous predictors
  • Overview
    • Course overview
    • Syllabus
    • Resources and Set-up
  • Part I: Foundations
    • 1  Understanding straight lines
    • 2  Simple linear regression
    • 3  Continuous predictors
    • 4  Multiple Regression
    • 5  Categorical predictors
    • 6  Logistic regression
  • Part II: Mixed models
  • Reports

Table of contents

  • Set-up environment
    • 3.1 Load data
  • 4 Summary
  • 5 Continuous predictors
    • 5.1 Centering
    • 5.2 Standardizing (z-scoring)
  • 6 Continuous responses
    • 6.1 Log-transformation
    • Important terms
  • 7 Centring continuous predictors
  • 8 Standardising
  • 9 Task

3  Continuous predictors

Regression for Linguists

Author
Affiliation

Daniela Palleschi

Humboldt-Universität zu Berlin

Published

October 2, 2023

This lecture is based on Ch. 5 (Correlation, Linear, and Nonlinear transformations) from Winter (2019).

Learning Objectives

Today we will learn…

  • how to log-transform continuous variables
  • why and how to centre continuous predictors
  • when and how to standardize continuous predictors

Set-up environment

# suppress scientific notation
options(scipen=999)

We’ll also need to load in our required packages. Hopefully you’ve already install the required packages (if not, go to Chapter 3).

# load libraries
pacman::p_load(
               tidyverse,
               here,
               broom,
               lme4,
               janitor,
               languageR)

3.1 Load data

df_freq <- read_csv(here("data", "ELP_frequency.csv")) |> 
  clean_names()

Reminder of our variables:

summary(df_freq)
     word                freq               rt       
 Length:12          Min.   :    4.0   Min.   :507.4  
 Class :character   1st Qu.:   57.5   1st Qu.:605.2  
 Mode  :character   Median :  325.0   Median :670.8  
                    Mean   : 9990.2   Mean   :679.9  
                    3rd Qu.: 6717.8   3rd Qu.:771.2  
                    Max.   :55522.0   Max.   :877.5  

4 Summary

In the last lectures we saw that the equation for a straight line boils down to its intercept and slope, and that linear regression fits a line to our data. This line results in predicted/fitted values, which fall along the line, and residuals, which are the difference between our observed values and the fitted values.

We also learned about two model assumptions: normality of residuals, and constant variance of residuals. We learned that we can plot these with histograms or Q-Q plots (normality), and residual plots (constant variance).

Now that we understand what a simple linear does, we can take a step back and focus on what we put into the model. So far we’ve looked at reaction times (milliseconds) as a function of word frequency. However, we don’t typically feed raw continuous data into a model, because most continuous linguistic variables are not normally distributed, and so a straight line will not fit it very well (because there will be some large variance for higher values).

5 Continuous predictors

Linear transformations refer to constant changes across values that do not alter the relationship between these values. For example, adding, subtracting, or multiplying by a constant value will not alter the difference between values. Think of the example in the last lecture on the relationship between heights and ages as a function of the measurement unit: the relationship between all the values did not alter, because the difference between heights millimeters, centimeters, and meters is constant, as is the difference between ages in days, months, or years. We’ll now look at some common ways of linearly transforming our data, and the reasons behind doing so.

5.1 Centering

Centering is typically applied to predictor variables. Centering refers to subtracting the mean of a variable from each value, resulting in each centered value representing the original value’s deviance from the mean (i.e., a mean-deviation score). What would a centered value of \(0\) represent in terms of the original values?

Let’s try centering our frequency values. To create a new variable (or alter an existing variable), we can use the mutate() function from dplyr.

# add centered variable
df_freq <- 
  df_freq |> 
  mutate(freq_c = freq-mean(freq))

This can also be done with base R, but it’s a lot more verbose.

# add centered variable with base R
df_freq$freq_c <- df_freq$freq-mean(df_freq$freq)

Now let’s fit our models.

# run our model with the original predictor
fit_rt_freq <- 
  lm(rt ~ freq, data = df_freq)
# run our model with the centered predictor
fit_rt_freq_c <- 
  lm(rt ~ freq_c, data = df_freq)

If we compare the coefficients from fit_rt_freq and fit_rt_freq_c, what do we see? The only difference is the intercept values: 713.706298 (uncentered) and 679.9166667 (centered).

mean(df_freq$rt)
[1] 679.9167

The intercept for a centered continuous predictor variable corresponds to the mean of a continuous response variable. This is crucial in interpreting interaction effects, which we will discuss tomorrow. For more detail on interpreting interactions, see Chapter 8 in Winter (2019) (we won’t be discussing this chapter as a whole).

5.2 Standardizing (z-scoring)

We can also standardize continuous predictors by dividing centered values by the standard deviation of the sample. Let’s look at our frequency/reaction time data again.

First, what are our mean and standard deviation? This will help us understand the changes to our variables as we center and stardardize them.

mean(df_freq$freq)
[1] 9990.167
sd(df_freq$freq)
[1] 18558.69

What are the first six values of freq in the original scale?

df_freq$freq[1:6]
[1] 55522 40629 14895  3992  3850   409

What are the first six values of freq_c in the centered scale? These should be the values of freq minus the mean of freq (which we saw above is 9990.1666667).

df_freq$freq_c[1:6]
[1] 45531.833 30638.833  4904.833 -5998.167 -6140.167 -9581.167

Now, let’s create our standardised z-scores for frequency by dividing these centered values by the standard deviation of freq (which will be the same as the standard deviation of freq_c), and which we saw is 18558.6881679. Again, this can be done with mutate() from dplyr, or by using base R syntax.

# standardise using the tidyverse
df_freq <- 
  df_freq |> 
  mutate(freq_z = freq_c/sd(freq))
# standardize with base R
df_freq$freq_z <- df_freq$freq_c/sd(df_freq$freq)
head(df_freq)
# A tibble: 6 × 5
  word      freq    rt freq_c freq_z
  <chr>    <dbl> <dbl>  <dbl>  <dbl>
1 thing    55522  622. 45532.  2.45 
2 life     40629  520. 30639.  1.65 
3 door     14895  507.  4905.  0.264
4 angel     3992  637. -5998. -0.323
5 beer      3850  587. -6140. -0.331
6 disgrace   409  705  -9581. -0.516
Correlation

6 Continuous responses

This is really the meat and potates of dealing with continuous variables. In linguistic research, we

6.1 Log-transformation

Important terms

term description/other terms

7 Centring continuous predictors

N.B., you would usually also centre numeric predictors. This is done by subtracting some constant from every value (usually by subtracting the mean of the predictor from each value and saving this as a new variable:

df_example %>% 
  mutate(predictor_c <- predictor-mean(predictor)

If you have interval data with a specific upper and lower bound, you could alternatively subtract the median value.

8 Standardising

Learning Objectives

Today we learned…

9 Task

Literaturverzeichnis

Winter, B. (2019). Statistics for Linguists: An Introduction Using R. In Statistics for Linguists: An Introduction Using R. Routledge. https://doi.org/10.4324/9781315165547
2  Simple linear regression
4  Multiple Regression
Source Code
---
title: "Continuous predictors"
subtitle: "Regression for Linguists"
author: "Daniela Palleschi"
institute: Humboldt-Universität zu Berlin
# footer: "Lecture 1.1 - R und RStudio"
lang: en
date: "`r Sys.Date()`"
format:
  html:
    number-sections: true
    number-depth: 3
    toc: true
    code-overflow: wrap
    code-tools: true
    self-contained: true
bibliography: references.bib
csl: apa.csl
execute:
  eval: true
  echo: true
  message: false
  error: false
  warning: false
  fig-align: "center"
  fig-asp: .618
---

This lecture is based on Ch. 5 (Correlation, Linear, and Nonlinear transformations) from @winter_statistics_2019.

# Learning Objectives {.unnumbered .unlisted}

Today we will learn...

- how to log-transform continuous variables
- why and how to centre continuous predictors
- when and how to standardize continuous predictors

# Set-up environment  {.unnumbered}

```{r}
# suppress scientific notation
options(scipen=999)
```

We'll also need to load in our required packages. Hopefully you've already install the required packages (if not, go to @sec-software).

```{r}
# load libraries
pacman::p_load(
               tidyverse,
               here,
               broom,
               lme4,
               janitor,
               languageR)
```

```{r}
#| echo: false

# extra packages for the lecture notes/slides
pacman::p_load(
               patchwork,
               knitr,
               kableExtra)
```

## Load data

```{r}
df_freq <- read_csv(here("data", "ELP_frequency.csv")) |> 
  clean_names()
```

Reminder of our variables:

```{r}
summary(df_freq)
```


# Summary

In the last lectures we saw that the equation for a straight line boils down to its intercept and slope, and that linear regression fits a line to our data. This line results in predicted/fitted values, which fall along the line, and residuals, which are the difference between our observed values and the fitted values.

We also learned about two model assumptions: normality of residuals, and constant variance of residuals. We learned that we can plot these with histograms or Q-Q plots (normality), and residual plots (constant variance).

Now that we understand what a simple linear does, we can take a step back and focus on what we put into the model. So far we've looked at reaction times (milliseconds) as a function of word frequency. However, we don't typically feed raw continuous data into a model, because most continuous linguistic variables are not normally distributed, and so a straight line will not fit it very well (because there will be some large variance for higher values).

# Continuous predictors

Linear transformations refer to constant changes across values that do not alter the relationship between these values. For example, adding, subtracting, or multiplying by a constant value will not alter the difference between values. Think of the example in the last lecture on the relationship between heights and ages as a function of the measurement unit: the relationship between all the values did not alter, because the difference between heights millimeters, centimeters, and meters is constant, as is the difference between ages in days, months, or years. We'll now look at some common ways of linearly transforming our data, and the reasons behind doing so.

## Centering

Centering is typically applied to predictor variables. Centering refers to subtracting the mean of a variable from each value, resulting in each centered value representing the original value's deviance from the mean (i.e., a mean-deviation score). What would a centered value of $0$ represent in terms of the original values?

Let's try centering our frequency values. To create a new variable (or alter an existing variable), we can use the `mutate()` function from `dplyr`.

```{r}
# add centered variable
df_freq <- 
  df_freq |> 
  mutate(freq_c = freq-mean(freq))
```

This can also be done with base R, but it's a lot more verbose.

```{r}
# add centered variable with base R
df_freq$freq_c <- df_freq$freq-mean(df_freq$freq)
```

Now let's fit our models.

```{r}
# run our model with the original predictor
fit_rt_freq <- 
  lm(rt ~ freq, data = df_freq)
```

```{r}
# run our model with the centered predictor
fit_rt_freq_c <- 
  lm(rt ~ freq_c, data = df_freq)
```

If we compare the coefficients from `fit_rt_freq` and `fit_rt_freq_c`, what do we see? The only difference is the intercept values: `r coef(fit_rt_freq)['(Intercept)']` (uncentered) and `r coef(fit_rt_freq_c)['(Intercept)']` (centered).

```{r}
mean(df_freq$rt)
```

The intercept for a centered continuous predictor variable corresponds to the mean of a continuous response variable. This is crucial in interpreting interaction effects, which we will discuss tomorrow. For more detail on interpreting interactions, see Chapter 8 in @winter_statistics_2019 (we won't be discussing this chapter as a whole).

## Standardizing (*z*-scoring)

We can also standardize continuous predictors by dividing centered values by the standard deviation of the sample. Let's look at our frequency/reaction time data again.

First, what are our mean and standard deviation? This will help us understand the changes to our variables as we center and stardardize them.

```{r}
mean(df_freq$freq)
```

```{r}
sd(df_freq$freq)
```

What are the first six values of `freq` in the original scale?

```{r}
df_freq$freq[1:6]
```

What are the first six values of `freq_c` in the centered scale? These should be the values of `freq` minus the mean of `freq` (which we saw above is `r mean(df_freq$freq)`).

```{r}
df_freq$freq_c[1:6]
```

Now, let's create our standardised z-scores for frequency by dividing these centered values by the standard deviation of `freq` (which will be the same as the standard deviation of `freq_c`), and which we saw is `r sd(df_freq$freq)`. Again, this can be done with `mutate()` from `dplyr`, or by using base R syntax.

```{r}
# standardise using the tidyverse
df_freq <- 
  df_freq |> 
  mutate(freq_z = freq_c/sd(freq))
```


```{r}
# standardize with base R
df_freq$freq_z <- df_freq$freq_c/sd(df_freq$freq)
```

```{r}
head(df_freq)
```


::: {.callout-tip}

# Correlation

:::

# Continuous responses

This is really the meat and potates of dealing with continuous variables. In linguistic research, we 

## Log-transformation

## Important terms {.unnumbered .smaller}

```{r}
#| echo: false
tribble(
 ~"term", ~"description/other terms",
 
) %>% kable() %>% kable_styling()
```


# Centring continuous predictors

N.B., you would usually also centre *numeric* predictors. This is done by subtracting some constant from every value (usually by subtracting the mean of the predictor from each value and saving this as a new variable:

```{r}
#| eval: false
df_example %>% 
  mutate(predictor_c <- predictor-mean(predictor)
```

If you have interval data with a specific upper and lower bound, you could alternatively subtract the median value.

# Standardising

# Learning Objectives {.unnumbered .unlisted}

Today we learned...


# Task



# Literaturverzeichnis {.unlisted .unnumbered visibility="uncounted"}

::: {#refs custom-style="Bibliography"}
:::